fixup! multi-pack-index: implement 'expire' verb#106
Closed
derrickstolee wants to merge 1 commit into
Closed
Conversation
There is a bug in the 'git multi-pack-index expire' subcommand. When expiring packs while also adding a pack not previously covered by the multi-pack-index, the logic around the pack permutations did not work correctly. It would improperly assign objects to the wrong pack-int-ids. Rework the logic around expired packs in the following ways: 1. We track the original order of the packs to delete, but do not remove them from the list right away. 2. We append the new packs to the total list of packs. 3. After sorting the full list of packs, we remove the expired packs from the list so we do not write them in the new multi-pack-index. 4444. Finally, we "puncture" the pack permutation to remove the entries that will be deleted. This requires reducing the int-ids for the packs that have final int-id larger than the expected int-id for the expired pack.
Author
|
This solution is too complicated and hard to be sure is correct. Closing until I rewrite it in a better, clearer way. |
| packs_to_drop->items[drop_index].string); | ||
|
|
||
| if (!cmp) { | ||
| expire_int_ids[drop_index] = i; |
There was a problem hiding this comment.
GH won't let me make a comment on a non delta line, so this is the closest line.
On line 812 below, incrementing drop_index will leave a zero cell in expire_int_ids[].
Can't tell yet if that is a good thing or a bad thing.
There was a problem hiding this comment.
Never mind. looks like you throw an error at 830 when that happens.
| int drop_index = 0; | ||
| int j; | ||
|
|
||
| /* truncate the list of packs */ |
There was a problem hiding this comment.
nit: the word "truncate" is misleading here. to me that says to lop of the end. but you're collapsing the newly created holes.
| } | ||
|
|
||
| for (j = packs_to_drop->nr - 1; j >= 0; j--) { | ||
| int expired_int_id = packs.perm[expire_int_ids[j]]; |
There was a problem hiding this comment.
nit: maybe "int permuted_int_id" ?
derrickstolee
added a commit
that referenced
this pull request
Jan 4, 2019
This replaces #106. That PR was complicated and difficult to understand because we didn't use structured data, but instead relied on our simple arrays and overloaded that data. This is a bigger change, but results in code that is (hopefully) easier to understand. The new flow for writing a multi-pack-index is as follows: 1. Construct a list of `midx_info` structs that contain the details of the packs. This list starts with the packs in the existing midx, followed by the new packs to add. Keep track of the `orig_pack_int_id` for these packs. 2. Construct the list of object entries. The `pack_int_id` we use here corresponds to the `orig_pack_int_id` for the pack we are using. 3. Sort the packs by name. 4. If we have packs to drop, identify where they are in the list of packs. We can use the sorted nature of the list to know we will find them in the correct order. 5. Determine the `new_pack_int_id` for each `struct midx_info` by tracking how many are dropped by that point in the list. 6. Construct a new permutation array that maps from `orig_pack_int_id` to `new_pack_int_id`. If the pack is expired, then the value used here is invalid and will error if any object tries to use that value. 7. Count the length of the pack names we will write, and modify the length to be properly aligned if necessary. 8. Write the midx as usual, tracking that we have `packs.nr - drop_count` packs to write. 9. When writing the object offsets, use `packs.perm` to translate from the `orig_pack_int_id` to `new_pack_int_id`. While this PR is just one giant commit, I will peel parts across multiple commits for upstream. These will be interleaved with the commits already in `microsoft/git:master`.
dscho
pushed a commit
that referenced
this pull request
Feb 27, 2019
This replaces #106. That PR was complicated and difficult to understand because we didn't use structured data, but instead relied on our simple arrays and overloaded that data. This is a bigger change, but results in code that is (hopefully) easier to understand. The new flow for writing a multi-pack-index is as follows: 1. Construct a list of `midx_info` structs that contain the details of the packs. This list starts with the packs in the existing midx, followed by the new packs to add. Keep track of the `orig_pack_int_id` for these packs. 2. Construct the list of object entries. The `pack_int_id` we use here corresponds to the `orig_pack_int_id` for the pack we are using. 3. Sort the packs by name. 4. If we have packs to drop, identify where they are in the list of packs. We can use the sorted nature of the list to know we will find them in the correct order. 5. Determine the `new_pack_int_id` for each `struct midx_info` by tracking how many are dropped by that point in the list. 6. Construct a new permutation array that maps from `orig_pack_int_id` to `new_pack_int_id`. If the pack is expired, then the value used here is invalid and will error if any object tries to use that value. 7. Count the length of the pack names we will write, and modify the length to be properly aligned if necessary. 8. Write the midx as usual, tracking that we have `packs.nr - drop_count` packs to write. 9. When writing the object offsets, use `packs.perm` to translate from the `orig_pack_int_id` to `new_pack_int_id`. While this PR is just one giant commit, I will peel parts across multiple commits for upstream. These will be interleaved with the commits already in `microsoft/git:master`.
dscho
pushed a commit
that referenced
this pull request
Mar 29, 2019
This replaces #106. That PR was complicated and difficult to understand because we didn't use structured data, but instead relied on our simple arrays and overloaded that data. This is a bigger change, but results in code that is (hopefully) easier to understand. The new flow for writing a multi-pack-index is as follows: 1. Construct a list of `midx_info` structs that contain the details of the packs. This list starts with the packs in the existing midx, followed by the new packs to add. Keep track of the `orig_pack_int_id` for these packs. 2. Construct the list of object entries. The `pack_int_id` we use here corresponds to the `orig_pack_int_id` for the pack we are using. 3. Sort the packs by name. 4. If we have packs to drop, identify where they are in the list of packs. We can use the sorted nature of the list to know we will find them in the correct order. 5. Determine the `new_pack_int_id` for each `struct midx_info` by tracking how many are dropped by that point in the list. 6. Construct a new permutation array that maps from `orig_pack_int_id` to `new_pack_int_id`. If the pack is expired, then the value used here is invalid and will error if any object tries to use that value. 7. Count the length of the pack names we will write, and modify the length to be properly aligned if necessary. 8. Write the midx as usual, tracking that we have `packs.nr - drop_count` packs to write. 9. When writing the object offsets, use `packs.perm` to translate from the `orig_pack_int_id` to `new_pack_int_id`. While this PR is just one giant commit, I will peel parts across multiple commits for upstream. These will be interleaved with the commits already in `microsoft/git:master`.
dscho
pushed a commit
that referenced
this pull request
May 25, 2019
This replaces #106. That PR was complicated and difficult to understand because we didn't use structured data, but instead relied on our simple arrays and overloaded that data. This is a bigger change, but results in code that is (hopefully) easier to understand. The new flow for writing a multi-pack-index is as follows: 1. Construct a list of `midx_info` structs that contain the details of the packs. This list starts with the packs in the existing midx, followed by the new packs to add. Keep track of the `orig_pack_int_id` for these packs. 2. Construct the list of object entries. The `pack_int_id` we use here corresponds to the `orig_pack_int_id` for the pack we are using. 3. Sort the packs by name. 4. If we have packs to drop, identify where they are in the list of packs. We can use the sorted nature of the list to know we will find them in the correct order. 5. Determine the `new_pack_int_id` for each `struct midx_info` by tracking how many are dropped by that point in the list. 6. Construct a new permutation array that maps from `orig_pack_int_id` to `new_pack_int_id`. If the pack is expired, then the value used here is invalid and will error if any object tries to use that value. 7. Count the length of the pack names we will write, and modify the length to be properly aligned if necessary. 8. Write the midx as usual, tracking that we have `packs.nr - drop_count` packs to write. 9. When writing the object offsets, use `packs.perm` to translate from the `orig_pack_int_id` to `new_pack_int_id`. While this PR is just one giant commit, I will peel parts across multiple commits for upstream. These will be interleaved with the commits already in `microsoft/git:master`.
dscho
pushed a commit
that referenced
this pull request
May 27, 2019
This replaces #106. That PR was complicated and difficult to understand because we didn't use structured data, but instead relied on our simple arrays and overloaded that data. This is a bigger change, but results in code that is (hopefully) easier to understand. The new flow for writing a multi-pack-index is as follows: 1. Construct a list of `midx_info` structs that contain the details of the packs. This list starts with the packs in the existing midx, followed by the new packs to add. Keep track of the `orig_pack_int_id` for these packs. 2. Construct the list of object entries. The `pack_int_id` we use here corresponds to the `orig_pack_int_id` for the pack we are using. 3. Sort the packs by name. 4. If we have packs to drop, identify where they are in the list of packs. We can use the sorted nature of the list to know we will find them in the correct order. 5. Determine the `new_pack_int_id` for each `struct midx_info` by tracking how many are dropped by that point in the list. 6. Construct a new permutation array that maps from `orig_pack_int_id` to `new_pack_int_id`. If the pack is expired, then the value used here is invalid and will error if any object tries to use that value. 7. Count the length of the pack names we will write, and modify the length to be properly aligned if necessary. 8. Write the midx as usual, tracking that we have `packs.nr - drop_count` packs to write. 9. When writing the object offsets, use `packs.perm` to translate from the `orig_pack_int_id` to `new_pack_int_id`. While this PR is just one giant commit, I will peel parts across multiple commits for upstream. These will be interleaved with the commits already in `microsoft/git:master`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There is a bug in the 'git multi-pack-index expire' subcommand. When
expiring packs while also adding a pack not previously covered by
the multi-pack-index, the logic around the pack permutations did not
work correctly. It would improperly assign objects to the wrong
pack-int-ids.
Rework the logic around expired packs in the following ways:
We track the original order of the packs to delete, but do not
remove them from the list right away. (This is stored as
expire_int_ids.)We append the new packs to the total list of packs.
After sorting the full list of packs, we remove the expired
packs from the list so we do not write them in the new
multi-pack-index.
Finally, we "puncture" the pack permutation to remove the
entries that will be deleted. This requires reducing the int-ids
for the packs that have final int-id larger than the expected
int-id for the expired pack.